Page Table Walker

The page table walker (PTW) is a critical component of the memory management unit (MMU) that translates virtual addresses to physical addresses. The PTW is responsible for walking the page table hierarchy to find the physical address corresponding to a given virtual address. This process involves multiple memory accesses and can be time-consuming, especially in systems with large page tables.

Overview

The mmu_base.cc is responsible for calculating the latency of the page table walk operation. The page table walk latency is calculated based on the number of memory accesses required to traverse the page table hierarchy and the latency of each memory access.

The function MemoryManagementUnitBase::performPTW is called to perform the page table walk for a given virtual address. This function triggers the page table walk operation and returns the physical address, the page size, the time taken for the page table walk, and whether a page fault occurred.

/**
 * @brief Perform a Page Table Walk (PTW) for a given address.
 *
 *
 * @param address The virtual address for which the PTW is performed.
 * @param modeled A boolean indicating whether the PTW should be modeled.
 * @param count A boolean indicating whether the PTW should be counted.
 * @param is_prefetch A boolean indicating whether the PTW is for a prefetch operation.
 * @param eip The instruction pointer (EIP) at the time of the PTW.
 * @param lock The lock signal for the core.
 * @return A tuple containing:
 *         - The time taken for the PTW (SubsecondTime).
 *         - Whether a page fault occurred (bool).
 *         - The physical page number (IntPtr) resulting from the PTW (at the 4KB granularity).
 *         - The page size (int).
 */

tuple<SubsecondTime, bool, IntPtr, int> MemoryManagementUnitBase::performPTW(IntPtr address, bool modeled, bool count, bool is_prefetch, IntPtr eip, Core::lock_signal_t lock, PageTable *page_table)
{
#ifdef DEBUG_MMU
    log_file_mmu << std::endl;
    log_file_mmu << "-------------------------------------" << std::endl;
    log_file_mmu << "[MMU_BASE] Starting PTW for address: " << address << std::endl;
#endif
  auto ptw_result = page_table->initializeWalk(address, count, is_prefetch);

The MemoryManagementUnitBase::performPTW function initializes the page table walk and filters out redundant memory accesses during the walk. What do we consider as redundant memory accesses? When a page table walk is performed and a page fault occurs, the page table walker will restart the walk from the root of the page table hierarchy and access the same memory locations again. These redundant memory accesses can be filtered out for simulation speed, as they do not affect accuracy significantly.
Why do we filter if the page table is radix? The page walk caches (PWC) are used to cache the results of page table walks to reduce the latency of subsequent walks. We filter out the page table entries that hit in the PWC to avoid redundant memory accesses.

ptw_result = make_tuple(get<0>(ptw_result), visited_pts, get<2>(ptw_result), get<3>(ptw_result), get<4>(ptw_result));

// Filter the PTW result based on the page table type
// This filtering is necessary to remove any redundant accesses that may hit in the PWC

if (page_table->getType() == "radix")
{
  filterPTWResult(ptw_result, page_table, count);
}

Now the core: MemoryManagementUnitBase::calculatePTWCycles function is called to calculate the latency of the page table walk operation. This function calculates the time taken for the page table walk based on the number of memory accesses required to traverse the page table hierarchy and the latency of each memory access.

int page_size = get<0>(ptw_result);
IntPtr ppn_result = get<2>(ptw_result);
bool is_pagefault = get<4>(ptw_result);


SubsecondTime ptw_cycles = calculatePTWCycles(ptw_result, count, modeled, eip, lock);

#ifdef DEBUG_MMU
  log_file_mmu << "[MMU_BASE] Finished PTW for address: " << address << std::endl;
  log_file_mmu << "[MMU_BASE] PTW latency: " << ptw_cycles << std::endl;
  log_file_mmu << "[MMU_BASE] Physical Page Number: " << ppn_result << std::endl;
  log_file_mmu << "[MMU_BASE] Page Size: " << page_size << std::endl;
  log_file_mmu << "[MMU_BASE] -------------------------------------" << std::endl;
#endif

return make_tuple(ptw_cycles, is_pagefault, ppn_result, page_size);

The MemoryManagementUnitBase::calculatePTWCycles may seem overly complex, but it is designed this way so that it can be used in a generic manner for every potential page table walk operation (e.g., page table walks for hash-based PTs).

If there is a page fault, the memory requests generated by the PTW should be sent to the cache hierarchy after the page fault handling latency. The Sim()->getMimicOS()->getPageFaultLatency() function is used to get the page fault handling latency.

if (is_pagefault)
{
  initial_delay = Sim()->getMimicOS()->getPageFaultLatency();
}

We iterate over every table and level in the page table hierarchy and calculate the latency of each memory access.

What is table? In the case of radix-based page tables, we only have a single table (the root table). However, in hash-based page tables we may have different tables for different page sizes.
What is level? The level corresponds to the "depth" that the memory request has been issued at. For example, in the case of radix-based page tables, the root request is at level 0, the next level is level 1, and so on. The final level's depth is 4. We use the concept of "level" so that we "serialize" the memory requests being sent to the memory hierarchy.

if (is_pagefault)
{
  initial_delay = Sim()->getMimicOS()->getPageFaultLatency();
}

for (int tab = 0; tab < (tables + 1); tab++)
{
  SubsecondTime fetch_delay = initial_delay;

  for (int level = 0; level < (levels + 1); level++)
  {

    for (UInt32 req = 0; req < accesses.size(); req++)
    {
      if (get<1>(accesses[req]) == level && get<0>(accesses[req]) == tab)
      {
#ifdef DEBUG_MMU
  log_file_mmu << "[MMU_BASE] We are accessing address: " << get<2>(accesses[req]) << " from level: " << level << " in table: " << tab << " at time: " << t_now+fetch_delay << std::endl;
#endif					


          
        packet.address = get<2>(accesses[req]);
        latency = accessCache(packet, t_now+fetch_delay);

#ifdef DEBUG_MMU
  log_file_mmu << "[MMU_BASE] We accessed address: " << get<2>(accesses[req]) << " from level: " << level << " in table: " << tab << " at time: " << t_now+fetch_delay << " with latency: " << latency << std::endl;
#endif
      
        if (get<3>(accesses[req]) == true)
        {
          correct_table = get<0>(accesses[req]);
          correct_level = get<1>(accesses[req]);
          latency_per_table_per_level[get<0>(accesses[req])][get<1>(accesses[req])] = latency;
          break;
        }
        else if (latency_per_table_per_level[get<0>(accesses[req])][get<1>(accesses[req])] < latency && level != correct_level)
          latency_per_table_per_level[get<0>(accesses[req])][get<1>(accesses[req])] = latency;
      }
    }
    fetch_delay += latency_per_table_per_level[tab][level];
#ifdef DEBUG_MMU
    log_file_mmu << "[MMU_BASE] Finished PTW for level: " << level << " at time: " << t_now << " with max latency: " << fetch_delay << std::endl;
#endif
  }
}

Overview​

Overview